contribution

All the group member participated in all the two assignments, and after discussion, formed this report.

Assignment 1

Question1

#Setup, Note: mapbox need registration
Sys.setenv('MAPBOX_TOKEN' = 'pk.eyJ1IjoieWloY2g4ODMiLCJhIjoiY2xtaG5zdHVsMDB6OTNwbzBqb2huYnA5eiJ9.bgRuRHPwhtgG63fFv0TSRQ')

#Q11
#read data
q1data = read.csv("aegypti_albopictus.csv")
#get data from different year
data_2004 = q1data[which(q1data$YEAR=="2004"),]
data_2013 = q1data[which(q1data$YEAR=="2013"),]

plot_2004 <- plot_mapbox(data_2004,x = ~X,y = ~Y,type = 'scatter', mode = 'markers', color = ~VECTOR, text = ~COUNTRY,  colors = c("orange", "blue")) %>% layout(title = 'Distribution of the two types of mosquitos in Year 2004')

plot_2013 <- plot_mapbox(data_2013,x = ~X,y = ~Y,type = 'scatter', mode = 'markers', color = ~VECTOR, text = ~COUNTRY,  colors = c("orange", "blue"))  %>% layout(title = 'Distribution of the two types of mosquitos in Year 2013')
plot_2004
plot_2013 

In Year 2004, Aedes Aegyoti had more appearance in Latin America, South Asia(India) and South-East Asia, these mainly appears on the coastal region. On the other hand, Aedes Albopictus had more observation in the US, some cases in EU and East Asia.

In Year 2013, Aedes Aegyoti had very high appearance in South America, mainly in Brazil. In contrast, Aedes Albopictus seems to has less observation, mainly in Europe and Taiwan.

One of the main perception problem is data points overlapping. For example, Taiwan has the largest number of observations in the data. However, due to the island’s small size, those observations could easily be overlooked.

Question2

#Q12
#calculate the numbers of mosquitos per country
Z <- q1data %>%
  group_by(COUNTRY_ID) %>%
  summarise(count = n())

#Plot it with reds color palette for better visualization
plot_q12<-plot_geo(Z)%>%add_trace(z = ~count, color = ~count, text = ~COUNTRY_ID, 
                                        locations = ~COUNTRY_ID, colors = 'Reds')%>% layout(
                                          geo = list(projection = list(type = "Equirectangular")))%>% 
                        layout(title = 'Numbers of mosquitos per country') 
                                          
plot_q12 

As the ranges of observation is very wide (Taiwan with 24837 and many country has only 1), the color scale cannot be easily interpreted. For example, India has 583 observations but has very similar color to it’s neighbor Pakistan with only 32 observations at first glance. ## Question3

#Q13
#Equirectangular projection plot
plot_q13_equi<-plot_geo(Z)%>%add_trace(z = ~log(count), color = ~log(count), text = ~COUNTRY_ID, 
                                  locations = ~COUNTRY_ID, colors = 'Reds')%>% layout(
                                    geo = list(projection = list(type = "Equirectangular"))) %>% 
                        layout(title = 'Equirectangular projection plot with log(Z)') 
plot_q13_equi
#Conic equal area projection plot
plot_q13_conic<-plot_geo(Z)%>%add_trace(z = ~log(count), color = ~log(count), text = ~COUNTRY_ID, 
                                   locations = ~COUNTRY_ID, colors = 'Reds')%>% layout(
                                     geo = list(projection = list(type = "conic equal area")))%>% 
                        layout(title = 'Conic equal area projection plot with log(Z)')  

plot_q13_conic

By logarithmizing the counts, the map in Question 3 is much easier for the reader to tell the difference between the intensity of different country. This improvement is due to the reduction in the scale range.
In terms of the advantages and disadvantages. For 3a(Equirectangular projection), the advantage will be this is the most common type of projection, so the reader can understand it much straight forward. However, since this type of projection is neither equal area nor conformal. The downside is it will distort the area of the land, especially in higher latitude , for example, the country like Canada will looks larger than it actually is.

For 3b(Conic equal area projection). The advantage will be the area of different country are preserved accurately. On the other hand, this type of projection will still distort distances and angles. This might become a problem when doing research on the effect of geolocation, since two countries that are closed might looks far away on this type of projection.

Question4

#Use contry ID to filter only Brazil
bra_2013 <-  data_2013[which(data_2013$COUNTRY_ID=="BRA"),]
# Create X1 Y1, N and so on
bra_2013_summary <- bra_2013%>%
  mutate(X1 = cut_interval(X, n = 100))%>%
  mutate(Y1 = cut_interval(Y, n = 100))%>%
  group_by(X1, Y1) %>%
  summarize(
    mean_X = mean(X),
    mean_Y = mean(Y),
    n = n(),
  .groups = 'drop'
  )
#using MapBox to plot
plot_mapbox(bra_2013_summary,x = ~mean_X,y = ~mean_Y,color =~n,type = 'scatter', mode = 'markers')%>%
  layout(
    mapbox = list(
      zoom = 2,
      center = list(lat = -15, lon = -55)
    )
  )

The eastern coastal region near Recife and the southern part near Sao Paulo are the areas most heavily affected by mosquitoes. Discretization does improve the analysis because it reduces overlap and provides a clearer view of the data points with color scaling.

Assignment 2

Question1

geo<-fromJSON(file="gadm41_SWE_1.json")
swe_income<-read.csv("swe_income.csv",encoding = "latin1")
income<-data.frame(region=unique(swe_income$region),
                   Young=swe_income$X2016[swe_income$age=="18-29 years"],
                   Adult=swe_income$X2016[swe_income$age=="30-49 years"],
                   Senior=swe_income$X2016[swe_income$age=="50-64 years"])

head(income)
##                   region Young Adult Senior
## 1    01 Stockholm county 418.9 716.4  742.5
## 2      03 Uppsala county 327.6 589.9  631.1
## 3 04 Södermanland county 346.0 532.0  551.6
## 4 05 Östergötland county 316.0 546.4  579.4
## 5    06 Jönköping county 359.6 563.7  605.0
## 6    07 Kronoberg county 334.4 546.9  577.2

Question2

violin<-plot_ly(swe_income, x=~factor(age), y=~X2016, split=~factor(age),
            type="violin", box=list(visible=T))%>%
  layout(title="mean income distributions per age group",
         xaxis=list(title="age"),
         yaxis=list(title="mean incomes"))
violin

As the age group increases, the mean incomes of Swedish households also increase(the median values for the three groups are 334.4, 531, 551.6 respectively).The income of the young group is much less than the other two groups. In addition, the wide section of each violin plot also shows that there is a high probability that the value will fall in this place.

Question3

#using the template from course website
attach(income)
s=interp(x=Young,y=Adult,z=Senior, duplicate = "mean")
detach(income)

surface<-plot_ly(x=~s$x, y=~s$y, z=~s$z, type="surface")%>%
  layout(title="dependence of Senior incomes on Adult and Young incomes",
         scene=list(xaxis=list(title="Young"),
                    yaxis=list(title="Adult"),
                    zaxis=list(title="Senior")))%>%
  colorbar(title="Senior")
surface

From the plot, it seems that there is a dependence of Senior incomes on Adult and Young income. As the Adult and Young income increase, the Senior income also becomes bigger and bigger. When we rotate the plot, the surface looks flat. Hence, we consider that linear regression would be suitable to model this dependence.

Question4

#See the structure of some region:
geo<-fromJSON(file="gadm41_SWE_1.json")
print(geo$features[[2]]$properties)
## $GID_1
## [1] "SWE.2_1"
## 
## $GID_0
## [1] "SWE"
## 
## $COUNTRY
## [1] "Sweden"
## 
## $NAME_1
## [1] "Dalarna"
## 
## $VARNAME_1
## [1] "Dalecarlia|Kopparberg"
## 
## $NL_NAME_1
## [1] "NA"
## 
## $TYPE_1
## [1] "Län"
## 
## $ENGTYPE_1
## [1] "County"
## 
## $CC_1
## [1] "NA"
## 
## $HASC_1
## [1] "SE.KO"
## 
## $ISO_1
## [1] "NA"
library(stringr)
## Warning: 套件 'stringr' 是用 R 版本 4.2.3 來建造的
#change the region contents, only keep the name
newregion<-str_remove_all(income$region,"county")
newregion<-gsub("[[:digit:]]+",'',newregion)
newregion<-str_remove_all(newregion," ")
newincome<-income
newincome$region<-newregion
newincome$region[14]<-"Orebro"
#plot choropleth graph
g=list(fitbounds="locations", visible=FALSE)
youngp<-plot_geo(newincome)%>%
  add_trace(type="choropleth",
            geojson=geo,
            locations=~region,
            z=~Young,
            featureidkey="properties.NAME_1")%>%
  layout(geo=g,
         title="incomes of Young in Sweden")
  
youngp
g=list(fitbounds="locations", visible=FALSE)
adultp<-plot_geo(newincome)%>%
  add_trace(type="choropleth",
            geojson=geo, 
            locations=~region,
            z=~Adult,
            featureidkey="properties.NAME_1")%>%
  layout(geo=g,
         title="incomes of Adults in Sweden")
adultp

From the “Young” plot, it seems that the people who work in or around Stockholm and Halland earn much more money than other places. The pattern is almost similar for adults in the “Adult” plot.

In addition, adult individuals have relatively higher incomes in South Sweden compared to North Sweden. But this pattern is not very significant among young people.

Question5

#get the coordinates of Linkoping from the website result
linkoping<-read.csv("linkoping.csv",encoding = "latin1")
#add linkoing city to the choropleth map
youngp%>%add_markers(x=linkoping$longitude,
                     y=linkoping$latitude,
                     text=~paste(linkoping$name),
                     color = I("red"))

Appendix

knitr::opts_chunk$set(echo = TRUE)
rm(list=ls())
library(ggplot2)
library(plotly)
library(dplyr)
library(rjson)
library(akima)
library(tidyr)

#Setup, Note: mapbox need registration
Sys.setenv('MAPBOX_TOKEN' = 'pk.eyJ1IjoieWloY2g4ODMiLCJhIjoiY2xtaG5zdHVsMDB6OTNwbzBqb2huYnA5eiJ9.bgRuRHPwhtgG63fFv0TSRQ')

#Q11
#read data
q1data = read.csv("aegypti_albopictus.csv")
#get data from different year
data_2004 = q1data[which(q1data$YEAR=="2004"),]
data_2013 = q1data[which(q1data$YEAR=="2013"),]

plot_2004 <- plot_mapbox(data_2004,x = ~X,y = ~Y,type = 'scatter', mode = 'markers', color = ~VECTOR, text = ~COUNTRY,  colors = c("orange", "blue")) %>% layout(title = 'Distribution of the two types of mosquitos in Year 2004')

plot_2013 <- plot_mapbox(data_2013,x = ~X,y = ~Y,type = 'scatter', mode = 'markers', color = ~VECTOR, text = ~COUNTRY,  colors = c("orange", "blue"))  %>% layout(title = 'Distribution of the two types of mosquitos in Year 2013')
plot_2004
plot_2013 
#Q12
#calculate the numbers of mosquitos per country
Z <- q1data %>%
  group_by(COUNTRY_ID) %>%
  summarise(count = n())

#Plot it with reds color palette for better visualization
plot_q12<-plot_geo(Z)%>%add_trace(z = ~count, color = ~count, text = ~COUNTRY_ID, 
                                        locations = ~COUNTRY_ID, colors = 'Reds')%>% layout(
                                          geo = list(projection = list(type = "Equirectangular")))%>% 
                        layout(title = 'Numbers of mosquitos per country') 
                                          
plot_q12 
#Q13
#Equirectangular projection plot
plot_q13_equi<-plot_geo(Z)%>%add_trace(z = ~log(count), color = ~log(count), text = ~COUNTRY_ID, 
                                  locations = ~COUNTRY_ID, colors = 'Reds')%>% layout(
                                    geo = list(projection = list(type = "Equirectangular"))) %>% 
                        layout(title = 'Equirectangular projection plot with log(Z)') 
plot_q13_equi

#Conic equal area projection plot
plot_q13_conic<-plot_geo(Z)%>%add_trace(z = ~log(count), color = ~log(count), text = ~COUNTRY_ID, 
                                   locations = ~COUNTRY_ID, colors = 'Reds')%>% layout(
                                     geo = list(projection = list(type = "conic equal area")))%>% 
                        layout(title = 'Conic equal area projection plot with log(Z)')  

plot_q13_conic
#Use contry ID to filter only Brazil
bra_2013 <-  data_2013[which(data_2013$COUNTRY_ID=="BRA"),]
# Create X1 Y1, N and so on
bra_2013_summary <- bra_2013%>%
  mutate(X1 = cut_interval(X, n = 100))%>%
  mutate(Y1 = cut_interval(Y, n = 100))%>%
  group_by(X1, Y1) %>%
  summarize(
    mean_X = mean(X),
    mean_Y = mean(Y),
    n = n(),
  .groups = 'drop'
  )
#using MapBox to plot
plot_mapbox(bra_2013_summary,x = ~mean_X,y = ~mean_Y,color =~n,type = 'scatter', mode = 'markers')%>%
  layout(
    mapbox = list(
      zoom = 2,
      center = list(lat = -15, lon = -55)
    )
  )
geo<-fromJSON(file="gadm41_SWE_1.json")
swe_income<-read.csv("swe_income.csv",encoding = "latin1")
income<-data.frame(region=unique(swe_income$region),
                   Young=swe_income$X2016[swe_income$age=="18-29 years"],
                   Adult=swe_income$X2016[swe_income$age=="30-49 years"],
                   Senior=swe_income$X2016[swe_income$age=="50-64 years"])

head(income)

violin<-plot_ly(swe_income, x=~factor(age), y=~X2016, split=~factor(age),
            type="violin", box=list(visible=T))%>%
  layout(title="mean income distributions per age group",
         xaxis=list(title="age"),
         yaxis=list(title="mean incomes"))
violin


#using the template from course website
attach(income)
s=interp(x=Young,y=Adult,z=Senior, duplicate = "mean")
detach(income)

surface<-plot_ly(x=~s$x, y=~s$y, z=~s$z, type="surface")%>%
  layout(title="dependence of Senior incomes on Adult and Young incomes",
         scene=list(xaxis=list(title="Young"),
                    yaxis=list(title="Adult"),
                    zaxis=list(title="Senior")))%>%
  colorbar(title="Senior")
surface
#See the structure of some region:
geo<-fromJSON(file="gadm41_SWE_1.json")
print(geo$features[[2]]$properties)

library(stringr)

#change the region contents, only keep the name
newregion<-str_remove_all(income$region,"county")
newregion<-gsub("[[:digit:]]+",'',newregion)
newregion<-str_remove_all(newregion," ")
newincome<-income
newincome$region<-newregion
newincome$region[14]<-"Orebro"
#plot choropleth graph
g=list(fitbounds="locations", visible=FALSE)
youngp<-plot_geo(newincome)%>%
  add_trace(type="choropleth",
            geojson=geo,
            locations=~region,
            z=~Young,
            featureidkey="properties.NAME_1")%>%
  layout(geo=g,
         title="incomes of Young in Sweden")
  
youngp

g=list(fitbounds="locations", visible=FALSE)
adultp<-plot_geo(newincome)%>%
  add_trace(type="choropleth",
            geojson=geo, 
            locations=~region,
            z=~Adult,
            featureidkey="properties.NAME_1")%>%
  layout(geo=g,
         title="incomes of Adults in Sweden")
adultp

#get the coordinates of Linkoping from the website result
linkoping<-read.csv("linkoping.csv",encoding = "latin1")
#add linkoing city to the choropleth map
youngp%>%add_markers(x=linkoping$longitude,
                     y=linkoping$latitude,
                     text=~paste(linkoping$name),
                     color = I("red"))